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DETAILED ACTION 



Claim Rejections - 35 USC §112 



The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

Claims 9-15 are rejected under 35 U.S.C. 112, first paragraph, as failing to 
comply with the enablement requirement. The claim(s) contains subject matter which 
was not described in the specification in such a way as to enable one skilled in the art to 
which it pertains, or with which it is most nearly connected, to make and/or use the 
invention. The disclosure fails to specifically disclose the step of obtaining the pitch 
value between one and five of each speech item. It is unclear how these values are 
used in the claimed invention. Appropriate correction is required. 



The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - (e) the invention was described in (1) an application for 
patent, published under section 122(b), by another filed in the United States before the invention by 
the applicant for patent or (2) a patent granted on an application for patent by another filed in the 
United States before the invention by the applicant for patent, except that an international application 
filed under the treaty defined in section 351 (a) shall have the effects for purposes of this subsection of 
an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 



Claim Rejections - 35 USC § 102 
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Claims 1 and 8 are rejected under 35 U.S.C. 102(e) as being anticipated by 
Coorman et al. (US Patent No. 6665641). 

1 . Regarding claim 1 , Coorman et al. disclose a method for converting text to 
concatenated voice by utilizing a digital voice library and a set of playback rules (col. 8, 
In. 59 to col. 9, In. 56), the digital voice library including a plurality of speech items and a 
corresponding plurality of voice recordings wherein each speech item corresponds to at 
least one available voice recording wherein multiple voice recordings that correspond to 
a single speech item represent various inflections of that single speech item (col. 9, In. 
1-8), the method including receiving text data, converting the text data into a sequence 
of speech items in accordance with the digital voice library (col. 9, In. 13-25), the 
method further comprising: 

determining a syllable count for each speech item in the sequence of speech 
items (col. 23, In. 35-45); 

determining an impact value for each speech item in the sequence of speech 
items (col. 9, In. 33-44 or referring to the COST FUNCTION sections on col. 12-15, the 
impact value is interpreted as how well the speech item fits in the concatenated 
speech); 

determining a desired inflection for each speech item in the sequence of speech 
items based on the syllable count and the impact value for the particular speech item 
and further based on the set of playback rules (col. 9, In. 26-37); 
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determining a sequence of voice recordings by determining a voice recording for 
each speech item based on the desired inflection for the particular speech item and 
based on the available voice recordings that correspond to the particular speech item 
(col. 9, In. 33-37); and 

generating voice data based on the sequence of voice recordings by 
concatenating adjacent recordings in the sequence of voice recordings (col. 9, In. 51- 
56). 

2. Regarding claim 8, Coorman et al. further disclose that a plurality of speech 
items includes a plurality of words, the method further comprising: 

determining a pitch value for each speech item in the sequence of speech items 
by normalizing the impact value for the particular speech item (col. 10, In. 49-55 of col. 
13, In. 48-53), wherein the desired inflection for each speech item is further based on 
the pitch value for the particular speech item (col. 10, In. 43-53). 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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Claims 16 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US Patent No. 6665641) in view of Jacks et al. (US Patent No. 
4692941). 

3. Regarding claim 16, Coorman et al. discloses a method for converting text to 
concatenated voice by utilizing a digital voice library and a set of playback rules (col. 8, 
In. 59 to col. 9, In. 56), the digital voice library including a corresponding plurality of 
voice recordings wherein each speech item corresponds to at least one available voice 
recording wherein multiple voice recordings that correspond to a single speech item 
represent various inflections of that single speech item (col. 9, In. 1-8), the method 
including receiving text data, converting the text data into a sequence of speech items in 
accordance with the digital voice library (col. 9, In. 13-25), the method further 
comprising: 

determining a syllable count for each speech item in the sequence of speech 
items (col. 23, In. 35-45); 

determining an impact value for each speech item in the sequence of speech 
items (col. 9, In. 33-44 or referring to the COST FUNCTION sections on col. 12-15, the 
impact value is interpreted as how well the speech item fits in the concatenated 
speech); 

determining a pitch value within a range for each speech item in the sequence of 
speech items by normalizing the impact value for the particular speech item (col. 13, In. 
48-53); 
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determining a desired inflection for each speech item in the sequence of speech 
items based on the syllable count and the pitch value for the particular speech item and 
further based on the set of playback rules (col. 9, In. 26-37) 

determining a sequence of voice recordings by determining a voice recording for 
each speech item based on the desired inflection for the particular speech item and 
based on the available voice recordings that correspond to the particular speech item 
(col. 9, In. 33-37); and 

generating voice data based on the sequence of voice recordings by 
concatenating adjacent recordings in the sequence of voice recordings (col. 9, In. 51- 
56). 

Coorman et al. fail to specifically disclose that the digital voice library includes a 
plurality of speech items, including glue items and payload items and the playback rules 
dictate that the desired inflection for a glue item is based on the desired inflection for 
surrounding payload items and that the desired inflection for a payload item is based on 
the desired inflection for nearest payload items with priority being given to speech items 
having a greater pitch value such that the desired inflections are determined first for 
speech items having the greatest pitch value and, thereafter, are determined for speech 
items in order of descending pitch. 

However, Jacks et al. teach that the digital voice library includes a plurality of 
speech items, including glue items and payload items (col. 4, In. 48-59) and the 
playback rules dictate that the desired inflection for a glue item is based on the desired 
inflection for surrounding payload items and that the desired inflection for a payload item 
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is based on the desired inflection for nearest payload items with priority being given to 
speech items having a greater pitch value such that the desired inflections are 
determined first for speech items having the greatest pitch value and, thereafter, are 
determined for speech items in order of descending pitch (col. 10, In. 1-27). The 
advantage of using the teaching of Jacks et al. in the modified Coorman et al. is to 
make the synthesized speech sound more naturally. 

Since the modified Coorman et al. and Jacks et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Jacks et al. in order to make the synthesized speech 
sound more naturally. 

4. Regarding claim 2, Coorman et al. fail to specifically disclose that the speech 
items are glue items and a plurality of the speech items are payload items, the method 
further comprising: 

setting a flag for any speech item in the sequence of speech items that is a glue 
item (col. 4, In. 48-50, the main point is to identify glue words), wherein the playback 
rules dictate that the desired inflection for a glue item is based on the desired inflection 
for surrounding payload items in the sequence of speech items and that the desired 
inflection for a payload item is based on the desired inflection for nearest payload items 
in the sequence of speech items (col. 9, In. 51 to col. 10, 27). The advantage of using 
the teaching of Jacks et al. in Coorman et al. is to analyze the structure of the sentence 
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and assign appropriate prosody to each word to make the synthesized speech sound 
more naturally. 

Since the modified Coorman et al. and Jacks et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to modify Coorman et al. by 
incorporating the teaching of Jacks et al. in order to analyze the structure of the 
sentence and assign appropriate prosody to each word to make the synthesized speech 
sound more naturally. 

Claims 3-5 and 17-19 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Coorman et al. (US Patent No. 6665641 ) in view of Jacks et al. (US Patent No. 
4692941) and further in view of Minowa et al. (US Patent No. 6438522). 

5. Regarding claims 3 and 17, the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of phrases. However, 
Minowa et al. teach that a plurality of speech items includes a plurality of phrases (col. 
7, In. 6-10). The advantage of using the teaching of Minowa et al. in the modified 
Coorman et al. is to allow the system to process phrase input speech items. 

Since the modified Coorman et al. and Gasper et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
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al. by incorporating the teaching of Gasper et al. in order to allow the system to process 
phrase input speech items. 

6. Regarding claims 4 and 18, the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of phrases. However, 
Minowa et al. teach that a plurality of speech items includes a plurality of words (col. 7, 
In. 6-10). The advantage of using the teaching of Minowa et al. in the modified 
Coorman et al. is to allow the system to process single word input speech items. 

Since the modified Coorman et al. and Gasper et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Gasper et al. in order to allow the system to process 
single word input speech items. 

7. Regarding claims 5 and 19, the modified Coorman et al. fail to specifically 
disclose that a plurality of speech items includes a plurality of syllables. However, 
Minowa et al. teach that a plurality of speech items includes a plurality of syllables (col. 
7, In. 10-25). The advantage of using the teaching of Gasper et al. in the modified 
Coorman et al. is to increase processing speed by using syllable-based segmentation 
scheme to reduce the number of speech models. 

Since the modified Coorman et al. and Minowa et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
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ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Minowa et al. in order to increase processing speed 
by using syllable-based segmentation scheme to reduce the number of speech models. 

Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Coorman 
et al. (US Patent No. 6665641) in view of Gasper et al. (US Patent No. 5278943). 

8. Regarding claim 6, Coorman et al. fail to specifically disclose that multiple voice 
recordings that correspond to a single speech item represent various inflections of that 
single speech item and wherein the various inflections belong to various inflection 
groups including a at least one standard inflection group, at least one emphatic 
inflection group, and at least one question inflection group. However, Gasper et al. 
suggest that stored recordings having different prosodic environments (col. 13, In. 18- 
29). Thus, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to modify Coorman et al. by specifically making records of these 
different inflections to provide the digital library a wide range of speech variations of 
particular words to enhance speech synthesis capabilities and increase system's 
reliabilities. 
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Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Coorman 
et al. (US Patent No. 6665641) in view of Gasper et al. (US Patent No. 5278943) and 
further in view of Jacks et al. (US Patent No. 4692941). 

9. Regarding claim 7, the modified Coorman et al. fail to specifically disclose that at 
least one question inflection group includes a single word question inflection group and 
a multiple word question inflection group. However, Jacks et al. teach that at least one 
question inflection group includes a single word question inflection group and a multiple 
word question inflection group (col. 9, In. 45-50). The advantage of using the teaching 
of Jacks et al. in Coorman et al. is to assign appropriate pitch to word(s) in a question to 
make the speech sound more naturally. 

Since the modified Coorman et al. and Jacks et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Jacks et al. in order to assign appropriate pitch to 
word(s) in a question to make the speech sound more naturally. 

Claim 20 is rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US Patent No. 6665641) in view of Jacks et al. (US Patent No. 
4692941), further in view of Minowa et al. (US Patent No. 6438522), and further in view 
of Gasper et al. (US Patent No. 5278943). 
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10. Regarding claim 20, the modified Coorman et al. fail to specifically disclose that 
multiple voice recordings that correspond to a single speech item represent various 
inflections of that single speech item and wherein the various inflections belong to 
various inflection groups including a at least one standard inflection group, at least one 
emphatic inflection group, and at least one question inflection group. However, Gasper 
et al. suggest that stored recordings having different prosodic environments (col. 13, In. 
18-29). Thus, it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to modify Coorman et al. by specifically making records of 
these different inflections to provide the digital library a wide range of speech variations 
of particular words to enhance speech synthesis capabilities and increase system's 
reliabilities. 



Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 703-305-8665 
and email address is huven.voOuspto.aov . The examiner can normally be reached on 
M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

Examiner Huyen X. Via /) February 12, 2004 
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